28 research outputs found

    Acoustic-Phonetic Approaches for Improving Segment-Based Speech Recognition for Large Vocabulary Continuous Speech

    Get PDF
    Segment-based speech recognition has shown to be a competitive alternative to the state-of-the-art HMM-based techniques. Its accuracies rely heavily on the quality of the segment graph from which the recognizer searches for the most likely recognition hypotheses. In order to increase the inclusion rate of actual segments in the graph, it is important to recover possible missing segments generated by segment-based segmentation algorithm. An aspect of this research focuses on determining the missing segments due to missed detection of segment boundaries. The acoustic discontinuities, together with manner-distinctive features are utilized to recover the missing segments. Another aspect of improvement to our segment-based framework tackles the restriction of having limited amount of training speech data which prevents the usage of more complex covariance matrices for the acoustic models. Feature dimensional reduction in the form of the Principal Component Analysis (PCA) is applied to enable the training of full covariance matrices and it results in improved segment-based phoneme recognition. Furthermore, to benefit from the fact that segment-based approach allows the integration of phonetic knowledge, we incorporate the probability of each segment being one type of sound unit of a certain specific common manner of articulation into the scoring of the segment graphs. Our experiment shows that, with the proposed improvements, our segment-based framework approximately increases the phoneme recognition accuracy by approximately 25% of the one obtained from the baseline segment-based speech recognition

    Belief-Based Nonlinear Rescoring in Thai Speech Understanding

    No full text
    This paper proposes an approach to improve speech understanding based on rescoring of N-best semantic hypotheses. In rescoring, probabilities produced by an understanding component are combined with additional probabilities derived from system beliefs. While a normal rescoring approach is to multiply or linearly interpolate with belief probabilities, this paper shows that probabilities from various sources are better combined using a nonlinear estimator. Using the proposed model together with a dialogue-state dependent semantic model shows a significant improvement when applying to a Thai interactive hotel reservation agent (TIRA), the first spoken dialogue system in Thai language

    Classification-based spoken text selection for LVCSR language modeling

    No full text
    Abstract Large vocabulary continuous speech recognition (LVCSR) has naturally been demanded for transcribing daily conversations, while developing spoken text data to train LVCSR is costly and time-consuming. In this paper, we propose a classification-based method to automatically select social media data for constructing a spoken-style language model in LVCSR. Three classification techniques, SVM, CRF, and LSTM, trained by words and parts-of-speech are comparatively experimented to identify the degree of spoken style in each social media sentence. Spoken-style utterances are chosen by incremental greedy selection based on the score of the SVM or the CRF classifier or the output classified as “spoken” by the LSTM classifier. With the proposed method, just 51.8, 91.6, and 79.9% of the utterances in a Twitter text collection are marked as spoken utterances by the SVM, CRF, and LSTM classifiers, respectively. A baseline language model is then improved by interpolating with the one trained by these selected utterances. The proposed model is evaluated on two Thai LVCSR tasks: social media conversations and a speech-to-speech translation application. Experimental results show that all the three classification-based data selection methods clearly help reducing the overall spoken test set perplexities. Regarding the LVCSR word error rate (WER), they achieve 3.38, 3.44, and 3.39% WER reduction, respectively, over the baseline language model, and 1.07, 0.23, and 0.38% WER reduction, respectively, over the conventional perplexity-based text selection approach

    Pioneering a Thai Language

    No full text
    A pioneering spoken dialogue system with Thai language interaction is introduced in this paper. The system aims at an automatic hotel reservation task in a mixedinitiative scheme. With an existing small set of speech corpus, an attempt to initialize the system for data collection is reported. Except a speech recognizer and synthesizer, the other submodules including a language understanding module, a dialogue manager, and a text generation module are invented as easily extensible, rule-based components. Descriptions of each submodule are given briefly. The first user's evaluation achieves medium user satisfaction and is a good indicator for us to rapidly improve the current system
    corecore